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Abstract 

The utilization of populations is one of the most important features of evolutionary algorithms (EAs). 
There have been many studies analyzing the impact of different population sizes on the performance of EAs. 
However, most of such studies are based computational experiments, except for a few cases. The common 
wisdom so far appears to be that a large population would increase the population diversity and thus help an 
EA. Indeed, increasing the population size has been a commonly used strategy in tuning an EA when it did not 
perform as well as expected for a given problem. He and Yao [H] showed theoretically that for some problem 
instance classes, a population can help to reduce the runtime of an EA from exponential to polynomial time. 
This paper analyzes the role of population further in EAs and shows rigorously that large populations may 
not always be useful. Conditions, under which large populations can be harmful, are discussed in this paper. 
Although the theoretical analysis was carried out on one multi-modal problem using a specific type of EAs, it 
has much wider implications. The analysis has revealed certain problem characteristics, which can be either 
the problem considered here or other problems, that lead to the disadvantages of large population sizes. The 
analytical approach developed in this paper can also be applied to analyzing EAs on other problems. 

1 Introduction 

As a crucial characteristic of Evolutionary Algorithm (EA) , the utilization of population enables explorations to 
different parts of the search space via a number of individuals. Although over the past decades most practical 
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EAs employ populations, the rigorous theoretical investigations on the impact of population on evolutionary 
algorithms were mainly carried out in the recent eight years. Concerning this issue, He and Yao [8] took one of 
the first attempts via the comparisons of the mean first hitting times of both (N + N) and (1 + 1) EAs on a 
class of multimodal problems derived from the well-known OneMax problem, and the purpose is to demonstrate 
the impact of population. Later, a number of theoretical investigations have been dedicated to study the first 
hitting times of EAs with either multiple parents [TTj [51 [H] or offsprings [TDJ H3] . It is expected that the EAs 
under investigations, which were known as (p + 1) and (1 + A) EAs, can establish a bridge from analyzing the 
(1 + 1) EA to studying (fi + A) EAs. In the meantime, it is also reported that the recent investigations on EAs 
with multiple parents and offsprings (e.g., (N + N) EAs) have eventually brought to the community broader 
perspectives on understanding the behaviors of population-based EAs. Chen et al. analyzed the time complexity 
of (N + N) EA on some well-known unimodal problems [5] and a wide-gap problem pQ . Lehre and Yao studied 
the impact of mutation-selection balance on the performance of (N + N) EA with linear ranking selection on 
a multimodal problem jll) . but the influence of employing different population sizes was not investigated. It is 
still not clear how different population sizes influence the performance of (N + N) EA on multimodal problems. 
In this paper, we carry out theoretical investigations on this issue. 

To study the effect of population size via theoretical investigation, it is clear that we need a suitable measure 
of performance for EAs. Most previous investigations adopt the well-known first hitting time measure, which is 
a random variable demonstrating the number of generations required by the EA to find the global optimum for 
the first time. Here we denote by r the first hitting time, and P(r < a) the accumulated probability of r. We 
further denote by t\ and Ti the first hitting times of two arbitrary EAs, say, EA-I and EA-II, respectively. If 
the following conditions holds 

• a,\ (n) is a polynomial function of the problem size n; 

• 02(71) is a super-polynomial function of the problem size n; 

• P[tl < ai(n)] is super-polynomially close to 1; 

• P[t"2 < a-2 (n)] is super-polynomially close to 0, 

then one can conclude that EA-I is more efficient than EA-II with a probability that is super-polynomially close 
to 1. A recent example, which successfully utilizes the above methodology in comparison of different EAs, is 
provided in [3J. However, when considering the following case, direct comparison of the first hitting times would 
become infeasible: 

• ai(n) is a polynomial function of the problem size n; 

• 02(71) is a super-polynomial function of the problem size n; 

• 0,3(71) is a super-polynomial function of the problem size n; 

• The reciprocal of P[n < cti(n)] is bounded from above by some polynomial function of n\ 

• The reciprocal of ¥[t± > 0,2(71)] is bounded from above by some polynomial function of n; 

• P[t"2 < 03 (n)] is super-polynomially close to 0. 

The reason is that EA-I is likely to perform as inefficient as EA-II. Nevertheless, since EA-I still takes a relatively 
high probability to perform efficient while EA-II performs inefficiently almost surely, we can still compare the 
performances of EA-I and EA-II by employing the solvable rate as an alternative measure, where the solvable 
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rate is the probability that the EA finds the global optimum of an optimization problem within a polynomial 
number of generations. The solvable rate can be considered as a generalized measure based on the probability 
distribution of the traditional first hitting time measure, and it concerns more about the probability with which 
an EA performs efficiently in general, rather than the detailed computation time for finding the optimum. In 
fact, in previous investigations (e.g., |13j). the idea of solvable rate has been utilized implicitly in company with 
the first hitting time results, though it has not been adopted as a measure of performance. 

By employing the solvable rate measure in this paper, we carry out theoretical analysis to study the impact of 
the population size on the performance of an (N + N) EA on a multimodal problem. This multimodal problem, 
which is called the TrapZeros problem, contains a global optimum (1, . . . , 1) and a local optimum (0, . . . , 0). 
The attraction basin of the global optimum only consists of solutions with the leading substring made up of 
In 2 n + 2 consecutive 1-bits. To find the global optimum, the EA has to enter its basin of attraction first by 
resisting the selection pressure which tends to preserve the solutions with leading 0-bits. For the (N + N) EA on 
the above problem, we consider three cases for the population size N, N = 1, N — O(lnn) and N = SI (n/ Inn), 
where the well-known (1 + 1) EA is considered as a special case of (N + N) EA (N = I). It is discovered that 
when the population size is relatively small (N — 1 or N — 0(\n n)), the solvable rate of (N + N) EA is still larger 
than the reciprocal of some polynomial function of the problem size, which implies that the EA, if running on 
an appropriate polynomial number of processors simultaneously and independently, can find the global optimum 
with a polynomial number of generations. However, given a large enough population size N = fi(n/hin), the 
solvable rate of the (N + N) EA has dropped to a level that is super-polynomially close to 0, implying that 
the EA cannot find the global optimum within a polynomial number of generations, unless one can offer a 
super-polynomial number of processors for the EA to run on. 

The rest of the paper is organized as follows: Section [5] introduces the algorithm and problem investigated in 
this paper. Section [3] presents the mathematical tools utilized in this paper. Section [4] analyzes the (N + N) EA 
with the population size N = 1. Section [5] shows the analytical result of the (N + N) EA with the population 
size N = 0(\nn). Section [6] concerns the (N + N) EA with the population size N = SI (n/ kin). Section [7] carries 
out discussions on the results presented in the previous sections. Section [8] concludes the whole paper. 



2 Problem and Algorithm 

In this section, we introduce the concrete optimization problem and EA investigated in this paper. 
2.1 Problem 

The maximization problem we consider in this paper, defined over the domain x = (x\, . . . ,x n ) £ {0, 1}", is 
called TrapZeros: 
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The TrapZeros problem is a multimodal problem, and its global optimum is x* = (1, . 
increasing the leading 0-bits in its solution may eventually lead to the local optimum (0, 



1). For TrapZeros, 
, 0) instead of leading 
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to the global optimum (1, . . . , 1). To reach the attraction basin of the global optimum, an optimization algorithm 
should first find some solutions with the leading substring consisting of In 2 n + 2 consecutive 1-bits. Otherwise, 
the selection pressure of an EA will tend to preserve the solutions with leading 0-bits. To facilitate the later 
investigations, we define three schemata as follows: 

Si = {(1,1,*,...,*)}, 
So -{(0,0,*,...,*)}, 



s* 



(1,1,1, 



.,*) 



In 2 



S* G Sr, 



where "*" can represent either or 1, S* and Si are the schemata containing the global optimum. Fig. [T] 
illustrates the fitness landscape of TrapZeros with respect to the schemata defined above, and it shows that 
the individuals belonging to S* are strictly better than any individual belonging to So, while the individuals 
belonging to So are strictly better than any individual belonging to Si \ S* . Utilizing this property, we will carry 
out rigorous analysis of an EA on TrapZeros later. 
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Figure 1: Illustration of the TrapZeros problem. 



2.2 Algorithm 

The {N + N) EA studied in this paper is with equal parent and offspring sizes. The detailed algorithm is 
described as follows: 

1. Initialization: The N initial individuals are generated uniformly at random, and the initial population 
£ i is obtained. 

2. Mutation: At the t th generation (t € N + ), the N individuals in the parent population £ t are mutated, 
and the offspring population £} t m ^ is obtained. The mutation of each individual in £ f utilizes the bitwise 
mutation, i.e., each bit of the individual is flipped independently with a uniform probability 1/n, where n 
is the problem size. 
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3. Selection: After the mutation step at the t th generation (t G N + ), the best N individuals in the parent and 

offspring populations (£ t U ^t™ 1 ') are selected to form the population £t+i, which is the parent population 
of the (t + l) th generation. Afterwards, set t = t + 1 and then go to the mutation step. 

The execution of the EA will stop if the stopping criterion is met. The above algorithm adopts the truncation 
selection, and does not employ any recombination operator. The investigation of EAs with recombination 
operator and other selection operators will be left as our future work. 

With respect to the (N + N) EA, the population size N must be a polynomial function of the problem size 
n, otherwise each generation of the EA would require super-polynomial number of fitness evaluations. When 
N = 1, the above (N + N) EA degenerates to the well-known (1 + 1) EA @]. 



2.3 Solvable Rate 

So far we have introduced the problem and algorithm investigated in this paper. In this subsection, we present 
the measure of performance for EA. First, let us review the first hitting time r, which is defined formally as 
below: 

t = mm{t; x* £ £ t }, 

where x* is the global optimum, and £t is the population at the t th generation. As the example we have presented 
in Section I, the first hitting time measure may sometimes become invalid for comparing the performance of 
two EAs. Here we provide an alternative measure, the solvable rate, to deal with the situation shown in the 
aforementioned example. Denoted by n, the solvable rate is formally defined by 

k = P{t ~< Poly(n)), 

where the event r -< Poly(n) means that there exists some polynomial function (of the problem size n) F(n) such 
that r < F(n) holds for any n > no, and no is a positive constant. Generally speaking, to derive appropriate 
bounds for the solvable rate, we have to concern the first hitting time r. In the next section, we introduce the 
mathematical tools for our further investigations. 

3 Analytical Approaches 

In this section, we present the analytical tools utilized in this paper. 
3.1 Probability Inequalities 

First of all, three well-known probabilistic inequalities are necessary to our later analysis. The inequalities are 
presented as the following lemmas: 

Lemma 1 (Chernoff bounds |12L |4]) Let Xi,Xz, . . . , Xk £ {0, 1} be k independent random variables with a 
same distribution: 

Vijtj;V[X i = l]=V[X j = l], 
where i,j G {1, . . . , k}. Let X be the sum of those random variables, i.e., X — y'., Xi, then we have 
• V0 < ip < 1: v\x < (1 - ipMX]] < e -E[x]^ 2 /2. 
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VO < i> < 2e- 1; 



X > (1 + i/>)E[X] 



W> > O: 



X > (1 + V)E[X] 



< 



< e -E[X]^ 2 /4^ 



Lemma 2 (Chebyshev Inequality [15]) Let X be a random variable with expectation E[X] and finite vari- 
ance Var[X]. Then for any real number r > 0, 



|X-E[X]| >r- 



< 



(2) 



Lemma 3 (Markov Inequality [12j) Let X > be a random variable with expectation E[X] . Then for a > 0, 
we have 



»[X >a}< 



E[X] 



(3) 



3.2 Decomposition of Population Set 

In this subsection, we introduce the concrete definitions and approaches for analyzing the EA on TrapZeros, 
which are inherited from our previous investigation [2J. Recall that we have defined the schemata <Si, S and 
S* in Section 12.11 Further, we denote by E the whole population set containing all populations. Based on the 
aforementioned definitions of the three schemata, we now present the decomposition of the population set E, 
which is necessary for our analytical approaches: 

• We denote by Eq the population set consisting of population £ with its best individual belonging to neither 
<Si nor Sq. 

• For any population £ with its best individual belonging to Si, we define the metric ro^(£) 



m (j4) (£) = min {y (A) (y); TrapZeros(j/) = max {TrapZeros (z); z € £} ,y € Si, y € £j > 

where g^ A \y) = n — 2— X)™=3 \Ui ~ M anc ^ S/ = (j/i , 2/2 , - - ■ , Vn) and z are individuals belonging to £. Based 
on the metric m( A \t;), we obtain n — 1 population sets, E p A ^ (p = 0, n — 2), where 

E ( P A) = {t;m {A H0=P}, P = 0,...,n-2. 

(A) 

The above definition, along with the fitness function TrapZeros, implies that E n _ 2 is the population set 
that is made up of all the population containing the global optimum x* = (1, 1, ... , 1). 

(A) 

Moreover, for any population £ S E p , we define a subset of £: 



G (j4) = {y; TrapZeros (y) = max{TRAPZEROS(z); z e £}, y e Si, y e £} , 
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• For any population £ with its best individual belonging to So, we define the metric m( s )(£) 

= min (y);TRAPZEROS(y) = max{TRAPZEROS(z); z G £} , y G S , y G f J , 

where g^ B \y) — n — 2 — Y^7=3 an< ^ V = (j/ij 2/2, ■ • • , J/n)- Based on the metric TnS B \£), we obtain n — 1 
population sets, 25p (p = 0, ra — 2), where 

4 B) = {£;m (B) (0=p}, P = 0,...,n-2. 

Moreover, for any population £ G -Bp , we define a subset of £: 

= {y, TRAPZEROS(y) = max{TRAPZEROS(z); z G £}, y G S , y G £} , 

• According to the above definitions, we have 

(ra-2 \ /n-2 \ 

U4 A) J^U4 fl) J- 

On the basis of the decomposition of E, we introduce the definitions of local optimal individual (type A and 
type B): 

Definition 1 (Local Optimal Individual, type A (LOIA)) Given a population set E p A ^ (p = 0, ...,n — 2) 

and a population £ G E p A \ we call an individual x G G^ the p-LOIA of £ on E p A ^ (LOIA for short, if we 
restrict the discussion on a given E p A ^ and £), if and only if x G G^ ; We call an individual x 1 the advanced 
LOIA for V£ G E p A ^ (advanced LOIA for short, if we restrict the discussion on a given E p A ^ ), if and only if x' 
is the LOIA of a population £ satisfying that C, G U"=p 2 +i E\ A \ 

( B) 

Definition 2 (Local Optimal Individual, type B (LOIB)) Given a population set E p ' (p — 0, ...,n — 2) 

and a population £ G E p B \ we call an individual x G G^ the p-LOIB of £ on E p B ^ (LOIB for short, if we 
restrict the discussion on a given E p B ^ and ( t ), if and only if x G G^ B \ We call an individual x' the advanced 

( B) (B) 

LOIB for V£ G E p (advanced LOIB for short, if we restrict the discussion on a given E p '), if and only if x 
is the LOIB of a population £ satisfying that £ G U™=p 2 +i ■ 

So far we have defined LOIA and LOIB so as to characterize the individuals belonging to the attraction 
basins of the global and local optima respectively, which enables us to define the so-called takeover times for the 
(N + N) EA. Briefly, the takeover time, proposed by Goldberg and Deb 6 , originally measures the number of 
generations required by the population to accumulate enough promising individuals, where a repeated selection 
process is concerned. Chen et al. [2] generalized the concept of "takeover process" to characterize the behavior 
population-based EAs on unimodal problems containing no local optimum. In this paper, on the TrapZeros 
problem, we further define two types of takeover processes for the global and local optima respectively: 

Definition 3 ((A, p, e)-takeover) A population £ is said to be (A, p, e)-takeover (p = 0, . . . , n~ 2, and < e < 1) 
if and only if its number of the p-LOIAs has reached \eN~\ (i.e., the proportion of LOIAs in £ has reached e), 
where all advanced LOIAs are pessimistically considered as p-LOIAs. 
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Definition 4 ((£>, p, e)-takeover) A population £ is said to be (B, p, e) -takeover (p = 0, . . . , n— 2, and < e < 

1) if and only if its number of the p-LOIBs has reached \eN~\ (i.e., the proportion of LOIBs in £ has reached e), 
where all advanced LOIBs are pessimistically considered as p-LOIBs. 

As it is assumed pessimistically in [2], throughout the paper we consider that the advanced LOIAs (LOIBs) 
cannot be generated, if a population is not (A, p, e)-taken over ((£?, p, e)-taken over). Once a population has been 
(A, p, e)-taken over ((£?, p, e)-taken over), it will concentrate on producing advanced LOIAs (LOIBs) for Ep A ^ 
(Ep ). Next, we formally define the (A, p, e)-takeover time and (B, p, e)-takeover time, and then characterize 
the processes of generating advanced LOIAs and LOIBs by the so-called (A,p, e) and (B,p,e) upgrade times, 
respectively. 

Given the population £ tp 6 Ep at the t l p h generation, we define its (A, p, e)-first hitting time to E P A ^ be 

^(6J = min L - t p ;t t , e ( (J E p A U ( U Ef A } , (4) 

where E p A ) is the population set containing the populations containing i p-LOIAs (i = 1, . . . , AT). The expectation 
of rj p A (£t ), restricted to a finite rf A e (£t_) and conditional on the starting population £, is defined by 

f) ( p A X0 = e [n$&,),v$fo,) < oo | 

The expectation 77p^(£) is called the (A, p, e)-takeover time of population £. Afterwards, we define the maximal 
(A, p, e)-takeover time as: 

and the (A, •, e)-takeover time is defined by 

= max {fj^ e ; < p < n - 2} . (5) 

Similarly, we can define (B, p, e)-takeover time ry p , e '(£), and the (B, -, e)-takeover time fje ■ 

According to the notion presented in [2], the evolution of individuals, if restricted in a specific attraction basin, 
can be characterized by the repeated "takeover-upgrade processes". Concretely, to reach the optimum in the 
attraction basin, the (TV + N) EA, with both mutation and selection, may often take a number of steps. At each 
step, the EA may need to accumulate enough promising individuals first, while the qualities of the individuals 
may not be significantly improved. Afterwards, when a considerable amount of promising individuals have been 
accumulated, the population will take a high probability for generating one or more better individuals. Formally, 
we further define the so-called (A, p, e)- and (B, p, e)- upgrade times. Given the population £ tp g f UfcLeW E p A ) 
at the t p h generation, we define its (A, p, e)-upgrade time be 

^(Ct,,) = min It' —t p ;€t>€ I M E ^ I < *p < 00 )■ ( 6 ) 




8 



Similarly, the (B, p, e)-upgrade time <jy p J (£t„ ) can be defined. In the meantime, when a population £ has already 
(A, p, e)-taken over, the probability of generating at least one advanced LOIA in one generation, denoted by 

(A) ■ ■ u 
u p J , is given by 



where P(£, £) is the one-generation transition probability from population £ to population The reciprocal of 
11$ is an upper bound of the mean (A, p, e)-upgrade time, due to the property of geometric distribution [T5] . 
Similarly, we can define the upgrade probability m P)£ , and its reciprocal bounds the mean (B, p, e)-upgrade time 
from above. 

3.3 Bounding First Hitting Time 

As defined above, both the {A, p, e)-takeover time and (B, p, e)-takeover time describe the time sufficient for 
the EA to accumulate enough promising individuals for generating better individuals. Restricted on S* , the 
evolution of individuals towards the global optimum can be characterized as the so-called repeated takeover- 
upgrade process: First, the promising individuals are accumulated by the overall impact of the selection and 
mutation operators; When there are enough promising individuals in the population, the probability of generating 
better individuals becomes large enough, thus one or more individuals will soon upgrade to better individuals. 
The evolution of individuals restricted on So U (Si \ S*) can be characterized in a similar way, though the 
evolution will eventually lead to the local optimum (instead of the global optimum) if no individual belonging 
to S* has been generated. 

Formally, given < pi < P2 < n — 2 and generation index t pi satisfying that the population £ t belongs to 
E p f^ , define the mean first hitting time to E p ^ of the (N + N) EA starting from E p f^ be 



By drift analysis [7] utilized in the proof of Proposition 1 in [2], we can easily combine the aforementioned 
repeated takeover-upgrade processes together and obtain the lemmas concerning the mean first hitting time to 
different population subset: 

Lemma 4 On the TrapZeros problem, the mean first hitting time to Ep^ of the (N + N) EA starting from 



u 






E 




t=i 



fc=pi+i 



(8) 



where ry t is defined by Eq. 
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The mean first hitting time to E p f^ of the (N + N) EA starting from E p f ( where In 2 n < p\ < P2 < n — 2 



(A) 



holds) satisfies: 



E 



O 



P2 

fc=pi+i 



(9) 



Interested readers can refer to the appendix of this paper for the detailed proof of the above lemma. On the 
other hand, combining the takeover-upgrade processes for the LOIBs, we have the following lemma: 

( B) 

Lemma 5 On the TrapZeros problem, given the population size N = fl(n/ Inn), the first hitting time to E P2 ' 

( B) 

of the (N + N) EA starting from E Pl (where < pi < P2 < n — 2 holds), conditional on that no individual 

( B) 

belonging to S* has ever been generated before the first time the EA reaches E P2 , satisfies the following inequality 
with an overwhelming probability: 



T (B) 

'Pl,P2 



< (p 2 -pi)(r^+ln 3 n) 



(10) 



where t p ^} P2 is the first hitting time to Epf^ starting from E p f\ f]^ satisfies the following condition: Vi G 
{0,...,n— 2},V£ ti € E\ B ^ : P (jl^fai) < V^^j ^ 1 — I / Super Poly (n), i.e., it is super-polynomially close to 1. 

The detailed proof is given in the appendix. 

In this section, we have provided the analytical tools for the theoretical investigations of the (N + N) EA. 
In the following parts of the paper, we will apply the tools to study the performance of the (N + N) EA with 
different population sizes, so as to demonstrate the impact of population size on the performance of EA. 



4 (N + N) EA with N = 1 

In this section, we analyze the performance of (1 + 1) EA on TrapZeros, where the (1 + 1) EA can be considered 
as a degenerate case of the (N + N) EA. By employing the aforementioned solvable rate as a measure, we obtain 
the following result: 

Theorem 6 The first hitting time t of the (1 + 1) EA on TrapZeros is 0(n 2 ) with the probability of \ — 
0(^ R ). In other words, the solvable rate k of the (1 + 1) EA on TrapZeros is at least \ - 0{^^). 

Proof. After initialization, with a probability of 1/4, the first and second bits of the initial individual both take 
the value 1, i.e., the initial individual belongs to Si. At any generation, the probability that the EA generates an 
offspring belonging to So is 1/n 2 . Noting that the fitness of individual belonging to the schemata {(1,0, *,..., *)} 
and {(0,1,*,. ..,*)} is strictly smaller than that of any individual belonging to 5i, the probability of avoiding 
finding any individual belonging to So at no later than the t th generation is at least (1 — l/n 2 )*/4. Hence, for 
any t < (2en - 1) In 2 n, the above probability is at least (1 - l/n 2 ) 2enln2 "/4 > (1 - 4eln 2 n/n)/4. 

Given the condition that the EA does not find any individual belonging to So before the ((2en — 1) In 2 n) th 
generation, we then estimate the time sufficient for the EA to find some solutions belonging to the schema S* . 
The reason of concerning the schema S* is that, once some individual belonging to S* is found, the elitist selection 
operator will not accept any individual belonging to So, which will eventually lead to the global optimum. It is 
easy to see that, at t th (t < (2en — 1) In 2 n) generation of optimization, the EA takes the probability of no less 
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than (l/n)(l — l/n) n 3 > l/(e 2 n) to find an individual with better fitness, conditional on the event that the 
EA does not find any individual belonging to Sq before the ((2en — 1) In 2 n) th generation. Hence, according to 
Chernoff bounds, with an extra probability of 1 — e~ e ' ln n > the EA can find some individual belonging to S* no 
later than the ((2en — 1) In 2 n) th generation. 

Once the EA has found a solution belonging to S* , it will continue to find better solutions and eventually 
find the global optimum. In this phase, the EA takes the probability of no less than (l/n)(l— l/n)™~ 3 > l/(e 2 n) 
to find an individual with better fitness, and in the worst case the Hamming distance between the individual 
and global optimum is n — In 2 n — 2. Hence, by Chernoff bounds, with an extra probability of 1 — e~ ("), the 
EA can find the global optimum with extra 0(n 2 ) generations. 

Combining the above propositions together, we have proven that the first hitting time r of the (1 + 1) EA on 
TrapZeros is 0(n 2 ) with a probability of (1/4)(1 - 2e In 2 n/n)(l - e~ e < ln2 - e~ e (™)) = 1/4- 0(ln 2 n/n).U 

The above theorem demonstrates that, when the population size is extremely small, the (N + N) EA can 
find the global optimum of TrapZeros with a constant probability. 



5 (N + N) EA with N = 0(ln n) 



In this section, the population size of the (N + N) EA grows to N — 0(\nn). The investigation begins with the 
following lemma concerning the number of LOIAs: 

Lemma 7 Let X t be the total number of LOIAs at the end of the t th generation. For the (N + N) EA with 
truncation selection, we have 



X t+1 > (l + ch)X t 



x t < 



N 



l + ch 



, So n £t+i — 



> l 



1 - h 



(l-c^Xth' 



where h is the probability that an old LOIA generates a new LOIA, c € (0, 1) is a constant. 

The proof of the above lemma is given in the appendix. Similar to Lemma [7] focusing on LOIAs, the following 
lemma about LOIBs holds: 

Lemma 8 Let Y t be the total number of LOIBs at the end of the t th generation. For the (N + N) EA with 
truncation selection, we have 



Yt+i > (1 + ch')Y t 



Y t < 



N 



1 + ch 



7 ,s*ntt+i = 



> l - 



l-hf 



(l-c)W 



where h! is the probability that an old LOIB generates a new LOIB, c G (0, 1) is a constant. 

For the sake of brevity, we do not offer the detailed proof of the above lemma. Interested readers can refer to 
the proof of Lemma [7] for details. 

By Lemma [7J we are able to bound the takeover time from above, which enables us to prove the following 
result: 

Theorem 9 Given the population size N satisfying N — O(lnn) and N = w(l), the first hitting time r of the 
(N + N) EA on TrapZeros is O(^) with a probability of l/Poly(n) Q. In other words, the solvable rate n of 
the (N + N) EA on TrapZeros is at least l/Poly(n), which is the reciprocal of some polynomial function of 
the problem size n. 



1 ln this paper, 1/ Poly(n) refers to some positive function (of the problem size n), whose reciprocal is bounded from above by a 
polynomial function of the problem size n. 
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Proof. The proof requires two major steps which aim at proving the following propositions respectively: 

[5Jl The probabilities of the following two events are both 1/ Poly{n): 1) The initial individuals all belong to 
S\] 2) The EA does not find any individual belonging to Sq within n\v?n/N generations. 

IH12 The maximal (A, p, 5/(5 + c))-takeover time (p € {1, . . . , n — 2}, c € (0, 1) is a constant) of the (N + N) 
EA on TrapZeros, conditional on the above two events, is upper bounded. 

Given the upper bound of the takeover times obtained when proving Proposition [9j2 , we can utilize Lemma [4] 
to obtain a conditional expected first hitting time of the EA, which immediately leads to the theorem according 
to [3] (Markov inequality) . 

Proof of Proposition [9jl 

The proof begins with the discussions on Proposition [SJl. After the initialization of the EA, with the probability 
of 1/4 , the first and second bits of all initial individuals all take the value of 1, i.e., the initial individuals 
all belong to Si. In the meantime, the probability for an individual (belonging to iSi) to generate an offspring 
belonging to So is 1/n 2 . On the other hand, since the fitness of every individual belonging to the schemata 
{(1, 0, *, . . . , *)} or {(0, 1, *, . . . , *)} is strictly smaller than that of any individual belonging to <Si, the truncation 
selection of the (N + N) EA will always immediately eliminate all the newly generated individual belonging 
to the schemata {(1, 0, *, . . . , *)} and {(0, 1, *, . . . , *)}. As a consequence of the above facts, the probability of 
avoiding finding any individual belonging to So before the t th generation is at least (1 — l/n 2 ) Nt /A N . In other 
words, we have 



n 3 n/N 



> 



l- 



4 7V 



> 



1 



2 In 3 n 



4 AT 



(11) 



So far we have proven Proposition [9] 1. 



Proof of Proposition [9j2 

Concerning Proposition ®2, we focus on the (A, p, e)-takeover time, where p £ {1, . . . , n — 2} holds. Here we 
need to consider two cases according to the value of p: 1) p € {1, ... , In 2 n}; 2) p g {In 2 n + 1, . . . , n — 2}. 

Let us study the first case first, where p £ {l,...,ln 2 n} holds. We note that at every generation the 
probability that an old LOIA generates a new LOIA is no smaller than (1 — p m ) n = (1 — 1/n)" > 1/5 (n > 2), 
where p m = 1/n is the mutation rate of the EA and (1 — p m ) n is the probability that an old LOIA generates an 
offspring that is the same to itself. Hence, in the (A, p, 5/(5 + c))-takeover process (where we let e = 5/(5 + c), 

and c € (0,1— is a positive constant) starting with X t = 1 (we consider the population with a unique 

LOIA at the beginning of the (A, p, 5/(5 + c))-takeover process in our worst-case analysis), the expected number 
of generations spent by the EA to accumulate 5 LOIAs is less than 5/(1/5) = 25 generations. 

On the other hand, for X t e [5, 57V/(5 + c)], according to Lemma[7J we further obtain the following inequality 



> 



> 1 



X t+ i > (1 + c/S)X t X t € [5, 5N/(5 + c)],So n &+i = 

X t+1 > (l+ch)X t \x t G [5,5A7(5 + c)],S n& +1 =0 
4 

" 5(1 - c) 2 ~ CT ' 



> 1 - 



1 - h 



{1-cfXth 



(12) 
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where c 6 (0, 1 — y |) is a constant, ft. > (1 — l/n) n > 1/5 is the probability that an old LOIA generates a new 

LOIA, and a = 1 — 4/ (5(1 — c) 2 ) < 1 is a positive constant. Now we can estimate upper bound for the takeover 
times, conditional on the event 



n In 3 n /-/V 

A ( 5 o n & 



(13) 



Here we call the event "Xt+i > (1 + c/5)X t " a success. Let I be the number of successes sufficient for the 
population to (A, p, e)-takeover, which can be estimated by 



/ = min 



(14) 



where e = 5/(5 + c). As a consequence, we obtain 



; j'ln5 — lnfeiNTI ^ ln5-ln[eAf] ^ ( 



lne 



In e 



Hence, i = 0(ln AT) successes are sufficient for (A, p, 5/(5+c))-takeover (in addition to an average of 25 generations 
that are sufficient for accumulating 5 LOIAs). According to Eq. [121 the expected number of generations that is 
sufficient for (A, p, 5/(5 + c))-takeover (p e {1, . . . , In 2 n}), conditional on the event described in Eq. Q21 is at 
most l/a + 25. 

Similar to the case of p e {1, . . . , In 2 n}, we study the (A, p, 5/(5 + c))-takeover time for the case of p € 
{In 2 n + 1, . . . , n — 2}. The only difference between the two cases is that the former is carried out in the context 
of the condition that the individuals belonging to Sq have not been generated at every generation (which is 
summarized in Eq. I13[) while the latter does not require such a condition. The obtained takeover time is similar: 
the expected number of generations sufficient for any [A, p, 5/(5 + c))-takeover (p S {In 2 n + 1, . . . , n — 2}) is at 
most l/a + 25. 

In the above analysis, we have analyzed in detail the (A, p, 5/(5 + c))-takeover times for any {1, . . . , n — 2}, 
which completes the proof of Proposition|9l2. 



Proof of Result 

The rest of the proof is to estimate the (A, p, 5/(5+c))-upgrade times 1/- 



>,5/(5+c) 



with respect to the (A, p, 5/(5+ 

c))-takeover times, and then follows the technique introduced Lemma [4] Concerning the former point, we still 
consider two cases as we have done for estimating the (A, p, 5/(5 + c))-takeover times. For the first case, where 

, under the condition described 



(A) 



U E 



(A) 



p € {1, . . . , In n} holds, we estimate the (A, p, 5/ (5 + c))-upgrade time 1/u 

in Eq. [13) Once the population of the EA belongs to E 
follows: 



p,\eN] w ^pJeJVl+l 



^ = 1- 



n \ n 



>1- 1- 



= 1 - 



i - 



can be calculated as 



where e = 5/(5 • 



c). Hence we have: 
l/u% < — 



1 - 



V e z n J 



< 



< 1 



[eN] 
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Similarly, for the second case p <E {In 2 n + 1, . . . , n — 2}, we can also obtain the corresponding (A, p, e)-upgrade 
time: 

1 /^ ) ^ 1 + TdvT = °y- 

The only difference between the two cases is that the proposition for the first case (p g {1, . . . , ln 2 n}) holds 
when the condition described in Eq. [T3]holds, while the second case (p € {In 2 n + 1, ... ,n— 2}) does not require 
such a condition. 

As a consequence, by Lemma 121 we can estimate the upper bound of the expected first hitting time (to the 
population subset E^, n+1 ) of the EA, conditional on the event described in Eq. [T31 Formally, let ts» be the 

first hitting time (to the population subset E^ n+1 ), Ts* be the above conditional expectation. According to 
Lemma 5] derived from the technique presented in [2] , the asymptotic order of such a conditional expectation 
is no larger than In 2 n(?y £ + max p<hl 2 n {l/tt£e}) < In 2 n (l/o + 25 + 0(n/Nn = In 2 n (O(lnlnn) + 0{n/N)) = 
0(ln n lnlnn + nln n/N). According to LemmaO (Markov inequality), under the condition described in Eq. Q21 
with the probability of at least 1/2, the first hitting time to the population subset E^J n+1 (i.e., rs») is less than 
Its* — uj(n\n 3 n/N). Combining this result with the proof of Proposition [H]l (which estimates the probability 
that the condition described in Eq. Q2]holds), we know that with a probability of (1 — 21n 3 n/n) ■ (l/2)/A N , the 
EA has found an individual belonging to S* within 2fs* — 0(ln n In Inn + n In 2 n/N) generations. 

Afterwards, following the same technique, we further bound from above the expected first hitting time to 
-^4-2 (t ne population subset consisting of all populations containing the global optimum), with the starting point 
(initial population) £i 6 ^Uj=in 2 n +i -^j^) • Formally, given the above starting point of the EA, let r' be the 

first hitting time to -E^_ 2 , f' be the expectation of r'. According to Lemma IH the asymptotic order of f' is no 
larger than (n-ln 2 n-2)(fj e + l/u { p A }) = (n-ln 2 n-2) (0(ln In n) + 0(n/N)) = 0(nlnln n + n 2 /N). By Markov 
inequality, with a probability of at least 1/2, the first hitting time t' is less than 2f' = 0(nlnlnn + n 2 /N). 

Combining the above results together, we obtain that the first hitting time of the (N + N) EA (i.e., r^. +t') 
is upper bounded by 2? s * + 2?' = 0(n 2 /N) with a probability of (1 - 2 In 3 n/n) ■ (1/2) • (1/2) /4 N = (1 - 
21n 3 n/n)/A N+1 . Noting that the population size satisfies N = O(lnn), the above probability is l/Poly(n). 
Hence, we have proven the whole theorem. □ 

6 (N + N) EA with N = Q{n/]nn) 

In this section, we further increase the population size, and study the performance of the EA on the basis of our 
previous results. We have the following result: 

Theorem 10 Given the polynomial population size N satisfying N = tt(n/ Inn), the first hitting time r of the 
(N + N) EA on TRAPZEROS is super-polynomial with an overwhelming probability. In other words, the solvable 
rate k of the (N + N) EA on TrapZeros is super-polynomially close to 0. 

Proof. The proof of Theorem [TU] contains several steps, in which we focus on different propositions required for 
proving the whole theorem: 

1101 1 After initialization, the probability that all the individuals in the population contain no more than 3 In ra/4 
1-bits among the leading In 2 n + 2 bits is super-polynomially close to 1. 
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ITD1 2 With an overwhelming probability, the EA cannot find any individual belonging to S* before the rf h 
generation, where n = In N ■ In Inn. With an overwhelming probability, there are still at least ln 2 n/16 
0-bits between the (In 2 n/8) th and (In 2 n + 3) th bits of each individual at the end of the rf h generation. 

[TU1 3 No later than the ij th generation, the population will be taken over by the individuals belonging to So with 
an overwhelming probability. 

[TU1 4 After the rj th generations, the probability that an individual belonging to S* is generated via direct mutation 
from an individual belonging to So is super-polynomially close to 0. 

Proof of Proposition [lOl 1 

The polynomial population size N = Q(n/]nn) implies that the initial population contains one or more in- 
dividuals belonging to So with the probability of 1 — (1 — l/4) w = 1 — (3/4) n ( n / ln ™) j which is an over- 
whelming probability. Meanwhile, the probability of generating one or more individuals belonging to S* is 

1 — (1 — (l/2) ln n + 2 ) N « N/2 hl ™ +2 , which is super-polynomially close to due to that N is a polynomial func- 
tion of the problem size n. Hence, after initialization the best individual in the population belongs to So with an 
overwhelming probability. On the other hand, concerning the number of 1-bits among the leading In 2 n + 2 bits 
for every initial individual, we apply Chernoff bounds, and obtain that the probability of having no more than 
3 In 2 n /4 1-bits is 1 - e~ ln2 ™ /32 , which is an overwhelming probability. As a consequence, after initialization, the 
probability that all the individuals in the population contain no more than 31n 2 n/4 1-bits among the leading 
In 2 n + 2 bits is also an overwhelming one. 

Proof of Proposition [101 2 

As mentioned, we define rj = InN • lnhin = 0(lnn • In Inn) (N — f2(n/lnn)), we now prove that within n 
generations, the probability of finding individuals belonging to S* is super-polynomially close to 0. Denote by 
C the event "the first 0-bit among the leading In 2 n + 2 bits of an individual is flipped, while the 1-bits before 
the flipped 0-bit (i.e., the leading 1-bits) have not been flipped". Optimistically, we assume that event C always 
happens before the rf h generation. Under the circumstance, with the overwhelming probability 1 — 0(l/n lnln "), 
at most ln ln n 0-bits among the leading In 2 n + 2 bits of an individual can be flipped simultaneously at each 
generation. Given a polynomial population size N, the above proposition implies that at every generation, the 
maximal number of 1-bits among the leading In 2 n + 2 bits of each individual in the population can increase 
by at most lnlnn with an overwhelming probability. Combining the above fact with the result presented 
in Proposition 1101 1. we know that in order to find an individual belonging to S* , we need at least (In 2 n + 

2 — 3 In 2 n/4)/ (lnlnn) > In 2 n/ (4 lnlnn) generations with an overwhelming probability, which implies that 
T) = o(ln 2 n/ (4 lnlnn)) generations are not enough for the EA to generate an individual belonging to S* with an 
overwhelming probability. 

In the meantime, the fact that (with an overwhelming probability) at most lnlnn 0-bits among the leading 
In 2 n + 2 bits of an individual can be flipped simultaneously at each generation also implies that (with an 
overwhelming probability) there are still at least 71n 2 n/8 + 2 — 31n 2 n/4 — 0(n • (lnlnn)) > ln 2 n/16 0-bits 
between the (In 2 n/8) th and (In 2 n + 3) th bits of each individual at the end of the rf h generation. 

Proof Sketch of Proposition [101 3 

Next, we need to prove that within n generations, the population will be taken over by the individuals belonging 
to Sq (i.e., (-6,0, l)-takeover) with an overwhelming probability. This proposition can be proven by a technique 
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similar to the one utilized in the proof of Theorem [HI i.e., by applying Chebyshev inequality (Lemma [5]) for 
establishing the probability of "a success" . More specifically, let Y t be the number of LOIBs at the t th generation. 
For Y t G [5, 5iV/(5 + c)], we define the event "Yt+i > (1 + c/5)>t" be a "success". According to Lemma[51 we 
obtain 



> 



> 1 



Y t+1 > (1 + c/S)Y t Y t G [5, 5A/(5 + c)],S* n 6+i 

Yt+i > (1 + ch')Y t \Y t G [5, 5A/(5 + c)],S* n 6+i : 
4 



> 1 



5(1 -c) 



c > 



(1 - c) 2 Y t /i' 



where c G (0, 1 — w |) is a constant, h' > (1 — 1/n)" > 1/5 is the probability that an old LOIB generates a new 

LOIB, and a' = 1 — 4/ (5(1 — c) 2 ) < 1 is a positive constant. For the sake of brevity, we omit the mathematical 
details of discussing the number of successes sufficient for (B, 0, l)-takeover. One can refer to the proof of 
Proposition [9] 2 for more details. Next we prove that the number of generations sufficient for (B, 0, l)-takeover 
is no larger than r\ with an overwhelming probability. 

Assume that we have already proven that the number of successes sufficient for (B, 0, l)-takeover (using 
Lemma|8]and the proof idea of Proposition[9]2), denoted by I', satisfies I' < In N/ ln(l/e) + 3 (where e = 5/(5 + c), 
c is a constant), and the probability of achieving a success at each generation (belonging to the (B, 0, 1)- 
takeover process) is no less than a constant a' G (0,1). For the (B, 0, l)-takeover process, let suc(T) be the 
number of successes happened among T generations. When the number of successes is smaller than V 0, the 
evolution at each generation can be considered as a Bernoulli trial: the probability of having a success is at 
least a'. According to Chernoff bounds, the probability that among T generations the number of successes 



suc(T) is no smaller than (1 - 8) ■ E[suc(T)] > (1 - 8) ■ (IV) is at least 1 
constant. By setting T 
T 



VhThT^ • /'/((! - 5)a') 



e ~Ta's where s e ( 0;1 ) is a 



9(ln N ■ Vm Inn), we obtain that, the probability that 
0(lnA ■ yhi Inn) generations are sufficient for obtaining more than /' = 0(lniV) successes is at least 



1 — e~ Ta s ~ I 2 = 1 — e - ( lnA, Vininn)er <5 2 /2^ wn j cn j s an overwhelming probability. By summarizing the above 
discussions, we know that with an overwhelming probability it takes at most 0(ln N ■ yln In n) generations for 
the population to be {B, 0, l)-taken over. Since 77 = In N ■ In In n — a; (In N ■ Vhrln n), we have proven Proposition 
HS3. 

Here, it is worth noting that the above proposition, along with the truncation selection that kills the worst 
N individuals among the overall 2N parents and offsprings, is getting very close to the final conclusion of the 
theorem. More precisely, once the population has been {B,p, l)-taken over by the individuals belonging to Sq 
(for any p = 0, . . . , n — 2), the truncation selection operator will no longer accept offsprings that belonging to 
neither S* nor So, since these offsprings have lower fitness than all N parents belonging to So. Hence, the only 
way to generate individuals belonging to S* is via the direct mutations of those parent individuals belonging to 
Sq. Next, we show that such a probability is super-polynomially small. 



2 When the number of successes suc(T) reaches V , the population has been (B, 0, l)-taken over. Afterwards, if no individual 
belonging to <S* has been generated (the corresponding duration has been investigated by Proposition 1101 2). the event "success" 
still makes sense except that the population has entered the (B, p, l)-takeover process, where p > 0. 
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Proof of Proposition [101 4 

Optimistically, here we can assume that the bits between the (In 2 n/8) th and (In 2 n + 3) th bits of each individual 
have become "free-riders" [I] when the population has just been (B, p, l)-taken over (where p € {0, . . . , In 2 n/8 — 
3}), i.e., the values of these bits are not influenced by selection pressure, and only genetic drift is considered 
(though some of them are very likely to be influenced by selection pressure which tends to preserve 0-bits). By 
dividing the evolutions after the rj th generation into two stages, we need to prove the following propositions: 

[TDJ4.1 No later than the (n + a) th generation, the population has been (B, p, l)-taken over by individuals with 
no less than In 2 n/8 consecutive leading 0-bits, where a — In 5 n. 

[TUJ4.2 After the rf h generation but before the population has been taken over by individuals with no less than 
In 2 n/8 consecutive leading 0-bit^f], with an overwhelming probability at least In 2 n/32 free-riders between 
the (In 2 n/8) th and (In 2 n + 3) th bits of each individual (in the population) will take the value of 0. 

By Lemma[5J we can study the (B, p, e)-takeover processes (p < In 2 n/8 — 2) and the corresponding (B, p, e)- 
upgrade probability, and prove that the number of generations sufficient for the population to be (B, In 2 n/8 — 
2, l)-taken over by individuals with more than In 2 n/8 consecutive leading 0-bits is bounded from above by 
(In n/8){fjl + max p<ln 2 n / 8 _ 2 {l/up -l }) < (In n/8)(?)e + In n) with an overwhelming probability, where 

( B) 

t)\ = o(\nN ■ In Inn) = o(lnn • In Inn). The proof follows the proof idea of Proposition [TUJ3 to study the 
takeover time fjl , and then applies Lemma[5]directly. For the sake of brevity, here we do not provide the details. 
Moreover, the above result, along with the condition N = Q(n/ In n), implies that r\ + (In 2 n/8){f]{ B ^ + In 3 n) < 
T) + In n = i] + a. Hence, we can reach Proposition 1101 4.1. 

On the other hand, recall that when the population has just been taken over by the individuals belonging 
to Sq (i.e., (B, In 2 n/8 — 2, l)-takeover), with an overwhelming probability there are still at least In 2 n/16 free- 
riders, between the (ln 2 n/8) tfl and (In 2 n + 3) th bits of each individual, taking the value of 0. Within In 5 n 
generations, each of the free-riders will receive at most a — In 5 n mutations. Given any individual at the rf h 
generation, there are at least In 2 n/16 free-riders (between its (In 2 n/8)*' 1 and (In 2 n + 3) th bits) taking the value 
of 0. For each of those free-riders, the probability that its value does not change within the a mutations is at 
least (1 — l/n) a ps 1 — 0(ln 5 n/n). According to Chernoff bounds, among the aforementioned In 2 n/16 free- 
riders of each individual, with the overwhelming probability of 1 — e _e ( ln = 1 — n _e P nn ), there are still 
In 2 n/32 free- riders consistently taking the value of between the if h generation and the (77 + a) th generation. 
As a consequence, the fact that the population size N is a polynomial function of the problem size n yields 
Proposition[TfJl4.2. 

By summarizing the above discussions on the number of free riders, we know that before the population has 
been taken over by individuals with no less than In 2 n/8 consecutive leading 0-bits (but after the rf h generation), 
the probability for each individual to find an offspring belonging to S* is at most l/n ln2 ™/ 32 . On the other hand, 
after the population has been taken over by individuals with no less than In 2 n/8 consecutive leading 0-bits, the 
probability for each individual to find an offspring belonging to S* is 1 /n ln ™/ 8 , since at that time each individual 
in the population will contain at least In 2 n/8 consecutive leading 0-bits. Further, since the population size N 
is polynomial, the probability that some individual in the population generates an offspring belonging to S* is 
still super-polynomially close to 0. 

Combining the results presented in Propositions [TOll. [TU12. [TU1 3 and [TUJ4 together, we have proven the 
theorem. □ 

3 (-B, In 2 n/8 — 2, l)-takeover, happened no later than the (n + a) th generation with an overwhelming probability. 
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7 Discussion 



So far we have seen three analytical results concerning the performance of a population-based EA with different 
population sizes. It is shown that with the increase of population size, the solvable rate of the (N + N) EA will 
drop to an extremely low level on the TrapZeros problem. Although the study in this paper is carried out on 
a specific problem using a specific type of EAs, it has much wider implications. The analytical results presented 
in this paper actually demonstrate an interesting problem characteristic under which the population-based EAs 
may perform poorly: when a problem has an attraction basin leading to some local optimum, and the individuals 
at this basin are with relatively high fitness than most individuals, a large population may not be useful and even 
becomes harmful, since it will lead to a large probability of finding individuals at the local basin. The resultant 
takeover process at the mistaken basin will quickly eliminates other promising individuals that may lead to the 
global optimum. After that, only large step sizes can help to find promising individuals again, resulting in a long 
runtime of the EA due to the small probability of getting close to the global optimum. 

The weakness of the population-based EAs without recombination on the above problem characteristic, 
shown in this paper, can partially be tackled by employing larger step sizes. For example, if some appropriate 
recombination strategy, which can generate large step sizes, is employed, the EA can probably provide much larger 
step sizes in comparison with bitwise mutation (adopting the commonly used mutation rate 1/n). Moreover, 
some adaptive/self-adaptive mutation schemes might also be helpful to cope with the situation, since they can 
provide large step sizes in exploring the correct attraction basin, and small large sizes in exploiting the correct 
basin. As a consequence, even if the whole population has been trapped in a mistaken basin, it is still possible 
to find promising individuals in other basins of attraction. The related investigations will be left as our future 
work. 

8 Conclusion 

In this paper, we have investigated the performance of an (N + N) EA with different population sizes on a 
multimodal problem, namely TrapZeros. The theoretical results have revealed a problem characteristic that 
may lead to poor performances of population-based EAs, as mentioned in the last section. This is the first time 
that the influence of population size on an (N + N) EA is analyzed theoretically. In addition, the proposed 
solvable rate, which is an intrinsic feature extracted from the probability distribution of the first hitting time, 
offers an alternative choice for measuring the performance of EA. 

Deriving from a recently-developed approach for analyzing EAs on unimodal problems [2] and following the 
well-known building block hypothesis, the utilized takeover-upgrade technique is capable of characterizing the 
evolution within a single basin of attraction as repeated takeover-upgrade processes that accumulate enough 
promising individuals and then generate better individuals by the accumulated individuals. In this paper, 
the successful application of this technique in modeling population-based EAs on a multimodal problem has 
shed some light on analyzing more complicated population-based EAs using similar techniques. To utilize such 
techniques, an elaborate procedure, as shown in the proof of Theorem [9j is required to estimate the takeover 
times, which is directly related to the selection operator. In the future, we will further study the EAs with 
different parent and offspring sizes, i.e., the (p + A) EAs. Moreover, we will combine the techniques with other 
state-of-the-art analytical tools, so as to gain more insight into the impact of other operators (e.g., recombination) 
and the corresponding parameter settings on the performance of EA. 
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Appendix 
Drift Analysis 

Drift analysis is a well-known technique for studying the time complexity of EAs [7j [9] . Formally, Let x* be 
the unique optimum of the objective function, let V(X) be the function that measures the distance between the 
population X and the optimum x* , then the one step mean drift at the t th generation of the EA, denoted by 
A(X, t), is given by 

A(X,t) = E\y{tt)-Vfo+i)\£t=X\ 

= ^2[V(X)-V(Y)]F(X,Y;t), 

where E is the whole population set. If no self-adaptive strategy is utilized in the EA, then the transition 
probability is not time dependent: 

F(X,Y;t) = F(X,Y). 

Under this circumstance, we simply use the notation A(X) to represent the one step mean drift. Meanwhile, if 

A(X) = E[V(£ t ) ~ V(M \tt=X\>0 

holds for t = 0, 1, . . . , then {V(^ t ) ' t = 0, 1, . . . } is called a super-martingale. According to He and Yao [7J 
once we can estimate the lower bound of the one step mean drift, then we can get the upper bound of the mean 
first hitting time by the following lemma: 

Lemma 11 (Drift Theorem^) Let {V(£t) ■ t = 0, 1, . . . } be a super-martingale describing an EA, if for any 
time t = 0, 1, 2, . . . , if V(£ t ) > and 

Witt) - 1 6] > q > o, 

then the mean first hitting time satisfies 

M I t 1 <r V ^ 
ci 

where £o is the initial population of the EA. 

One of the key steps when applying drift analysis is to specify an appropriate distance function V(-). The 
following lemma tells us that when the expected first hitting time of a homogeneous absorbing Markov chain, 
conditional on the starting state X, is defined as the distance function for the population state X, the one step 
mean drift equals 1: 

Lemma 12 (|9j) Let L : {L t , t = 0, 1, . . . } be a homogeneous absorbing Markov chain defined on the space M , 
H C M be its absorbing subspace. For L, its first hitting time to H , which is formally defined by 

t = min{t > Q;L t e H], 

satisfies 

' E[t\L q = X] = 0, X e H; 

E[t\L q =X]- £ YeM F(X,Y)E[ T \L = Y] = l,X £ H, 

where E[t\Lq = X] is the expected first hitting time of the absorbing Markov chain L starting with the initial 
state L = X . 

The proof of Lemma |4] will utilize the above lemmas. 
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Proof of Lemma [4] 

We offer the detailed proof for Eq. [5] in Lemma 2] Before bounding the mean of the first hitting time r/" 4 -* 



(A) 



we have to study the (A, k, e)-takeover process (pi < k < < In 2 n) by restricting the Markov chain on E [ k 
under the condition "no individual belonging to Sq has ever been generated before the first time the EA reaches 

(A) 

E P2 . Formally, the above condition is represented by 



PI <P2 



A ( 5 one 



t+1 



The rest of the proof will be presented in the context of the above condition (for the sake of brevity we omit it 
when presenting mathematical details). 

The analysis follows a pessimistic style, which is quite similar to the proof of Proposition 1 in [2 . If the 
number of the LOIAs of a population Z is smaller than eN, we ignore all potential emergences of advanced LOIAs. 
Once some advanced LOIA is generated before the number of the LOIAs reaches eN, we assume pessimistically 
that each advanced LOIA is only a LOIA, i.e., it is replaced by some LOIA that is randomly selected from the 
LOIAs in the current population. In response to this step, the population Z is transformed to Y (Y £ E^). 
Afterwards, according to the number of LOIAs, we consider two different cases: \eN~\ < i < N and 1 < i < \eN~\ , 
and use the notation "— >fc" to represent the mapping from |Jj=jfe+i Ej to E k A ^: 

• If [eN] <i<N, then Z Y. (An advanced LOIA is transformed to a LOIA.) 

• If 1 < i < \eN~\, then Z — s-fc Y' , where Y' is obtained by directly replacing the best \eN~\ individuals of Y 
with \eN~\ randomly selected LOIAs, Y' e E^)^ (We ignore the advanced LOIAs but consider that the 
population has (A, k, e)-taken over by \eN~\ LOIAs). 

The aim of the above transformations is to restrict the whole (A, k, e)-takeover process on the subspace E k A \ 

The consequence is that we can utilize an auxiliary homogeneous absorbing Markov chain (Q , i = 0, 1, . . . ) on 

E^ to study the whole (A, k, e)-takeover process. The transition probabilities of the auxiliary Markov chain 
are given by 



?(X, Y) = < 



*{X,Y), 
?(*,Y) + E 



X + Y, 

^\z^ kY nx,z), X + Y, 

X + Y, 
X + Y, 
X = Y, 



XtUlum E , 
J 



(A) 
k,i ' 



[eN] E k^i ' Y £ U, 



P (A) 



X G Uil feJVl E kfi ' 

E (A) 



V f= I \ N F (A) 
1 fc Ui=f £ JVl a k,i 

Y € U i= f £ JVl E k,i 



(15) 



where X, Y € E ( k A) 



According to the definitions of transition probabilities presented in Eq. [15l we have 
£ P(X,Y) = 1. 



Obviously, (\J^ eN] 

=fc+i Ej\ is absorbing in (( fc) . The other subspaces are E^ , 
According to Lemma [T2l we know that for X € {j^' 1 E f), 



E 



(A) 

k,leN~\- 



E [C ) w-C ) ( y )] f ( x ' F ) = 1 



(16) 
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On the other hand, let £t fc be the population of the it' 1 generation. For £ (fc G £'fc 4 \ we define its first hitting 
time to the population set |Jj=fe+i ^j A ^ > starting from : 



(A) 



j=k+l 

The mean first hitting time (to the set Uj=fe+i ^j^) °f the starting with the population t; tk — X, denoted 



°y Tfci+n is g iven °y 



E 



r (A) 
'fe,fe+ 



oo 



Let m h (•) specify the probability distribution of the population at the t k h generation (note that we have assumed 



that £ th G then 



E 



(A) 
'fe,fe+l 



is called the mean first hitting time to population set U*=fc+i -^j • Now we utilize drift analysis to bound 



E 



r (A) 



First, we define a distance function ^W(X) for X £ E ( k A) U ULfe+i (Pi < fc < P2 < In 2 n): 



-2 



VXgU^+x^ 



For each population set (k = 1, . . . , n), we show that the one step mean drift of the populations is no less 

than some positive constant. Given a population X G Ei , let A(X) be its one step mean drift. The estimation 
of one step mean drift involves two different cases (Eq. [17] and Eq. [T9|) . 



For X G 



(A) 



A(X) 



^ ^«(x)-v (fc) (F)]p(x,y)+ ^ [y (fc) (x) - y (fc) (y)] r(x,Y) 



> I" E Ce^^) 



(17) 
(18) 



holds since the truncation selection operator always preserves the best N individuals among the 2N individuals 
in the parent and offspring populations of each generation. 
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For x e (US 1 " 1 ,Y e u (u^ +1 Ef), 
A(x) = ^ [v (fc) (x)-\/ (fc) (y)]p(x,y)+ J2 [v (k) (x) -v {k) (Y)^p(x,Y) 



E 



+ 



E 



Jj<? (X)P(X, Y) + £ (X) - jj£> (Y)] P(X, Y) 

^7 m^ 1 " 1 **? 



£ 9 W(x)p(x,y)+ X) [ni A hx)-n l £{Y)]nx,Y) 

E [C ) w-C( F )]^ x . y ) = 1 > 



(19) 



where Eq. [19] holds because of Eq. [16] The results presented in Eq. [17] and Eq. [19] show that the one step 
mean drift of X (Xe E k A ^) is always no less than 1. According to Lemma [TT] and Eq. we further have 



E 



(A) 



1 I & =*]= 1/45 + = 0(1/"^ + ^ A) ) ■ 



(A) 

Then the first hitting time from E k to E k satisfies 



E 



r fe,fe+i 



= y, ^w E Ml ) +ii^ = x l=°( 1 /4fe ) +# 3 ) E 



For X € £^ (fc' 7^ fc, pi < fc' < p 2 < In 2 n), we can obtain the same upper bound by the same techniques. In the 
meantime, the truncation selection operator of the EA always preserves the best individual in every generation 



(ESS), thus once the population of the EA has reached Uj=fc+i Eji it will never return to E^' again. Hence, 



combining the conditional means of r k k+1 with respect to different k, we have proven the first part of Lemma [4J 



E 



T w 

PI 'P2 



f\ (So n 6+i = 0) 



o 



P2 

E (# 3 + v4^) 



fc=pi+i 



The second part of Lemma [4] (Eq. [8]) can be proven in a similar way, except that the condition "no individual 
belonging to Sq has ever been generated before the first time the EA reaches E^" is no longer required. □ 



Proof of Lemma [5] 

Proof. The idea of the proof is straightforward: in the worst case, the (N + N) EA should spend pi — pi repeated 
takeover-upgrade processes in which no individual belonging to S* has ever been generated before the first time 

( B) ( B) ( B) 

the EA reaches E P2 . Instead of considering the mean of the first hitting time from E Pl to E P2 , we consider 
the upper bound of the first hitting time which holds with an overwhelming probability, i.e., a probability that 
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is super-polynomially close to 1. Since the truncation selection is adopted by the EA, such an upper bound 
can be obtained by combining directly the upper bounds of the takeover and upgrade times together. As it is 

( B) ( B) 

defined, with an overwhelming probability, fje is an upper bound of the (£?, z, e)-takeover time rfi (£f . ) for any 

f B} 

population £ ti G E\ , where i G {0, . . . , n — 2} holds. 

In addition, to prove the lemma, we have to prove that In 3 n is an upper bound of the upgrade time with 
an overwhelming probability. Linking the above proposition to a variable A obeying the geometric distribution 
with parameter u ■ £ , the probability that A is bounded from above by In n can be calculated as follows: 



P [X < In 3 n] = 



In 3 n 

E 

k=l 
1 - 



= k] = 1 - VJ P[A = fc] 

In 3 n+1 



+ OG 



E 



fc-i 



ln 3 ra 



(20) 



fc=ln 3 n+1 



Concerning the upgrade probability u\ e , we can estimate its lower bound under the condition TV = f2(n/lnn). 
Given the population £ tp G ^U^L eAr E'^' 1 ^ containing at least eN LOIBs, u[ B J is bounded from below: 



«i? > 1 



1 



1 



> 1 



1 



1 



> 1 



Noting that the population size N satisfies N = f2(n/lnn), the above inequality implies 

n 

By inserting the above inequality into Eq. 1201 we obtain 



_ 1 _ e -n(l/lnn) 



1 

Inn 



[X < In 3 n] = 1 - ( 1 - u\ 



In 3 n 



= 1 - 



1 - CI 



1 

Inn 



In 3 n 



> I — e -"(ln 2 n) 



(21) 



which is super-polynomially close to 1. Given the population ^. € ^U^ =eiV £^^ J at the tf 1 generation (pi < 



( B) 

i < ^2)7 the (B : i, 6)-upgrade time be <^ e (£t p ) obeys the geometric distribution with parameter uf'J . Hence, 

( B) ~ ( B) 

by replacing the variable A with <^ e (£ t . ) in Eq. [21] we know that ^ e (£ j ) is bounded from above by In 



,(B) 



with an overwhelming probability. Noting that Vi € {0,...,ra- 2},V£ ti G : P(n-f } (6J < 

1 — 1/ Super Poly {n) (condition of Lemma [5]), following the proof idea presented at the beginning of the proof, 
we have proven that the upper bound of the first hitting time, shown in Eq. 1101 holds with an overwhelming 
probability, i.e., a probability that is super-polynomially close to 1. □ 
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Proof of Lemma [7] 

Let Z t+ i be the number of generated LOIAs at the (t + l) th generation. Given that c £ (0, 1) is a constant, for 
X t < N/(l + ch) we have 



z t+ i > c-E[z t+ i\x t ,s n£t+i 



x t < 



N 



1 + ch 

E[z t+1 \x t ,s nZt+i = %}-z t+1 < (i- c )-E[z t+1 \x t ,s n^ +1 



x t < 



N 



> 



E[Zt + i\x t ,s ntt+i 



Zt+i 



< (i- c )-E[z t+1 \x t ,s n£t+i 



x t < 



1 + eft 
N 



1 + ch 



> 1- 



z t+ i - E[z t+1 \x t ,s n e t +i - 0] > (i - c) ■ E[z t+1 \x u s n & +1 

Var[Z t+1 |X t < 1^,50 PI 6+1 - 0] , X t ?i(l - ft) 



X t < 



1 - ft 



N 



1 + eft 



(l-c)*E3[Z t+1 |X t < «Sont t+1 



(l-c) 2 Xfh 2 {l-c) 2 X t h : 



(22) 



where in Eq. (2D we utilize the fact that the newly generated LOIAs can all be accepted^ (where X t < N/ (1 + cft) 
holds), the number of newly generated LOIAs obeys the binomial distribution [15] . Hence, by considering the 
total number of LOIAs at the end of the (t + l) th generation, we know that 



P X t+1 > (1 + ch)X t 
holds, where c € (0, 1) is a constant. 



X t < 



N 



1 + eft 



s n & 



> i 



i - ft 



(i - c yx t h 



(23) 

□ 
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